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Abstract. In the constraint programming framework, state-of-the-art 
static and dynamic decomposition techniques are hard to apply to prob- 
lems with complete initial constraint graphs. For such problems, we pro- 
pose a hybrid approach of these techniques in the presence of global 
constraints. In particular, we solve the subgraph isomorphism problem. 
Further we design specific heuristics for this hard problem, exploiting its 
special structure to achieve decomposition. The underlying idea is to pre- 
compute a static heuristic on a subset of its constraint network, to follow 
this static ordering until a first problem decomposition is available, and 
to switch afterwards to a fully propagated, dynamically decomposing 
search. Experimental results show that, for sparse graphs, our decom- 
position method solves more instances than dedicated, state-of-the-art 
matching algorithms or standard constraint programming approaches. 



1 Introduction 

Graph pattern matching is a central application in many fields |T| and can be suc- 
cessfully modeled using constraint programming [12|17|19] . Here, we stress how 
to apply decomposition techniques to solve the Subgraph Isomorphism Problem 
(SIP) in order to outperform the dedicated state-of-the-art algorithm. 

Decomposition techniques are an instantiation of the divide and conquer 
paradigm to overcome redundant work for independent partial problems. A con- 
straint problem (CSP) can be associated with its constraint network, which 
represents the active constraints together with their relationship. During search, 
the constraint network looses structure as variables are instantiated and con- 
straints entailed by domain propagation. The constraint network can possibly 
consist of two or more independent components, leading to redundant work due 
to the repeated computation and combination of the corresponding independent 
partial solutions. The key to solve this is decomposition that consists of two 
steps. The first step detects the possible problem decompositions, by examin- 
ing the underlying constraint network for independent components. The second 



step exploits these independent components by solving the corresponding par- 
tial CSPs independently, and combines their solutions without redundant work. 
Decomposition can occur at any node of the search tree, i.e. at the root node 
or dynamically during search. In constraint programming, decomposition tech- 
niques have been studied through the concept of AND/OR search [T5]. AND/OR 
search is sensitive to problem decomposition, introducing search subtree com- 
bining AND nodes as an extension to classical OR search nodes. The size of the 
minimal AND/OR search tree is exponential in the tree width while the size of 
the minimal OR search tree is exponential in the path width, and is never worse 
than the size of the OR tree search. 

The check for decomposition is usually done in one of two ways. Either, only 
the initial constraint network is statically analysizcd, resulting in a so called 
pseudo-tree. This structure encodes both, the static search heuristic and the 
information when a subproblem is decomposable [5,. Another possibility is to 
consider the dynamic changes of the constraint network by analyzing it at each 
node during the search [3] . Such a dynamic approach is better suited if a strong 
constraint propagation (e.g. by AC) is present but obviously to the cost of ad- 
ditional computations. 

A major problem of decomposition techniques are their problem specificity. 
Without good heuristics, decomposition may occure seldom or very late such 
that the computational overhead for checking etc. is too high for an efficient 
application. Nevertheless, some approaches have been shown to be more general 
by applying dedicated algorithms, e.g. graph separators or cycle cutset condi- 
tioning |10|15|16j . 

However, those (usually static) algorithms fail to compute good heuristics 
on problems with global constraints, which have an initially complete constraint 
graph. Indeed, such algorithms presuppose a sparse constraint graph. In the sub- 
graph isomorphism problem (SIP), for example, the initial constraint graph is 
complete due to the presence of a global alldif f-constraint. This prevents cycle 
cutset and graph separator algorithms to be applied. A further drawback of a 
static analysis is the non-predictable decomposability of the constraint network 
achieved by constraint entailment through propagation. To exploit this, a dy- 
namic analysis of the problem structure during the search is necessary. This is 
of high importance for SAT- [13] and CSP-solving [3]. Unfortunately, a dynamic 
analysis requires significant additional work that slows down the search process 
once more. 

In this paper we show how to overcome those shortcomings by combining 
static and dynamic decomposition approaches to take advantage of decomposi- 
tion for the hard problem of SIP. A combination yields a balance between the 
fast static analysis and the needed full propagation exploited by dynamic search 
strategies in the presence of global constraints. The underlying idea is to follow 
the static ordering until a first problem decomposition is available (or likely) and 
to switch afterwards to a full propagated decomposing search. For the later, we 
consider only a binary constraint representation inside the constraint network 
in order to compute a good decomposition-enforcing heuristic. As shown in the 
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experiments, this idea is a key point for an efficient application of a decomposing 
search (as AND/OR) for the SIP. 

To face the problem of graph pattern matching Q] many different types of 
algorithms have been proposed, ranging from general methods to specific algo- 
rithms for particular types of graphs. The state-of-the-art approach is the dedi- 
cated FF-algorithm, freely available in the C++ vf lib library [2J. In constraint 
programming, several authors |12|17j have shown that graph matching can be 
formulated as a CSP problem, and argued that constraint programming could 
be a powerful tool to handle its combinatorial complexity. Our modeling [19] is 
based on these works. In [115], we showed that a CSP approach is competitive 
with dedicated algorithms over a graph database representing graphs with vari- 
ous topologies. Regarding decomposition, Valiente and al. [H] have shown how 
to use decomposition techniques in order to speed up subgraph homeomorphism. 
|18] states that, if the initial pattern graph is made of several disconnected com- 
ponents, then matching each component separately is equivalent to matching all 
of them together. Specific algorithms are also demonstrated. Our work can be 
seen as an extension to this work. We consider the subgraph isomorphism prob- 
lem instead of the subgraph homeomorphism problem. The latter case is easier 
as the constraint graph is made only of the initial pattern graph. Moreover, we 
apply the decomposition dynamically when |18] decomposes only statically on 
the initial pattern graph. 

Objectives and results - In this paper we study the limits of the direct 
application of state-of-the-art (static and dynamic) decomposition techniques 
for problems with global constraints; we show that such a direct application 
is useless for SIP. We develop a hybrid decomposition approach for such prob- 
lems and design specific search heuristics for SIP, exploiting the structure of the 
problem to achieve decomposition. We show that the CP approach using the 
proposed decomposition techniques outperforms the state-of-the-art algorithms, 
and solves more instances on some classes of problems (sparse instances with 
many solutions). 

The paper is structured as follows. Section [2J introduces a decomposition 
method able to detect decomposition at any stage during the search. In Section 
[3j the proposed decomposition method is applied and specialized to SIP. Exper- 
imental results assessing the efficiency of our approach are presented in Section 
H] Section [5] concludes the paper. 

2 Decomposition 

In this section we show how to define and detect decomposition during search. 
Sections 12.11 and 12.21 define a decomposition method able to detect decomposi- 
tion at any state during search, considering that we do not know a priori when 
decomposition occurs. Section 12.31 shows that our method is able to compute 
the same decompositions than the AND/OR search framework [IS], where the 
search is precomputed on a graph representation of the constraint network, and 
decomposition events are known in advance. The AND/OR search method has 
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shown to be very attractive for a large classes of constraint networks. But as we 
will see in SectionOH our method is suited for the SIP while the AND/OR method 
is not applicable because the decomposition events cannot be precomputcd. 

2.1 Preliminary 

A Constraint Satisfaction Problem (CSP) P is a triple (X, T>, C) where X = 
{xi, . . . , x n } is a set of variables, V = {Di, . . . , D n } is a set of domains (i.e. a 
finite set of values), each variable Xi is associated with a domain Di, and C is a 
finite set of constraints with scope(c) C X for all c G C, where scope{c) is the 
set of variables involved in the constraint c. A constraint c over a set of variables 
defines a relation between the variables. A solution of the CSP is an assignment 
of each variable in X to one value in its associated domain so that no constraint 
c G C is violated. We denote Sol(P) the set of solutions of a CSP P. 

^ A partial CSP P of a CSP P = (X, V, C) is a CSP (X, T>, C) where X C X, 
VD k G 2? : D k C and CCC. Note that since P is a CSP, we have scope(c) C 
A for all c G C*. 

2.2 Decomposing CSPs and graphs 

This subsection defines the notion of decomposition for a CSP. A CSP is de- 
composable into partial CSPs if the CSP and its decomposition have the same 
solutions. 

Definition 1. A CSP P is decomposable in partial CSPs Pi, . . . ,P n iff : 

- V s G Sol(P) : 3 si, . . . , s fc G Sol(Pi), . . . , Sol(P k ) : s = U^i^js, 

- V si, . . ., Sfe e Sol(Pi), . . . , Sol{P k ) : 3 s G Sol(P) : s = U ie [i jfe ]Si. 

This general definition of decomposition can be instantiated to two practical 
cases. The first definition corresponds to the direct intuition of a decomposition: 
a CSP is decomposable if it can be split into disjoint partial CSPs. It is called 
O-decomposability as no variable are shared between the partial CSPs. 

Definition 2. A CSP P = (A, V, C) is O-decomposable in partial CSPs P x , . . . , P n 
with Pi = (Xi,Vi,Ci) iffy 1 < i < j < n : X t n X, = 0, U ie[lM X, = X, 
Uie[i,k]X ) i = "D, U ie [ lyk ]Ci = C. 

The second definition finds more decompositions by allowing the partial CSPs 
to have instantiated variables in common. It is called 1-dccomposability as vari- 
ables shared between the partial CSPs have a domain of size 1. 

Definition 3. A CSP P = (A, V, C) is 1-decomposable in partial CSPs P\, . . . , P k 
with Pi = (Xi,T>i, Cj) iff V 1 < i < j < n : x G (X { n Xj) => \D X \ = 1, 
Uie[i,fe]A, = X, U. te [i, fc ]2?i = T>, U ie [ lyk ]Ci = C. 

The relationship with the general definition is direct. If a CSP P is 0- 
decomposable or 1-decomposable in partial CSPs Pi, . . . ,P k , then P is decom- 
posable in partial CSPs Pi, . . . , P k . From Definitions [5] and [31 it follows further : 
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Property 1. If a CSP P = {X, V, C) is O-decomposable in Pi, . . . , P k , then P is 
1-decomposablc in Pi , . . . , P&. Further P might be 1-decomposablc in P[ , . . . , P^, 
with k' > k via overlapping partial problems P/. 

Redundant computation during CSP-solving is performed whenever a CSP is 
0- or 1-decomposable into k partial CSPs P±, . . . , P k . For instance, if the solutions 
of Pi are computed first, then for each solution of Pi repeatedly all solutions of 
P2, . . . , Pfc are computed. Therefore, P2, . . . ,P k are solved \Sol(Pi)\ times and 
this overhead can be exponential in the size of the CSP. This can be avoided 
by solving the partial problems independently. The necessary detection of the 
CSP-decomposition into independent partial CSPs can be performed through 
the concept of constraint graphs. 

A graph G = (V, E) consists of a vertex/node set V and an edge set E C 
V x V , where an edge (u, v) is a pair of nodes. The vertices u and v are the 
endpoints of the edge (u, v). We consider directed and undirected graphs. A 
subgraph of a graph G = (V,E) is a graph G' = (V',E') with V C V and 
E' C E such that V( u , v )eE' '■ u i v 6 V ■ A graph G is said to be singly connected 
if and only if there is at most one simple path between any two nodes in G. 

Definition 4. The constraint graph of a (partial) CSP P = (X,V,C) is an 
undirected graph G p — (V,E) where V = X and E = {(xi,Xj) 3 c G C : 
Xi, Xj £ scope(c)}. 

Note that all variables in the scope of one constraint form a clique in G p . 
This constraint graph is also called the primal graph [J. There is a standard 
syntactic way of decomposing a CSP, based on its constraint graph. 

Definition 5. A graph G = (V, E) is decomposable into k subgraphs G\, . . . ,Gk 
iff^i<%<j<k ■ Vi n Vj = 0, U ie [i ifc ]Vi = V, and U^i^jPi = E. 

Property [2] shows that one has to compute disjoint components of the con- 
straint graph to detect independent CSPs. This can be done in linear time. 

Property 2. Given a CSP P = (X, £>, C) with its constraint graph G, for all 
k > 1, the constraint graph G of P is decomposable in G\, . . . , Gfc, iff P is 

0- decomposable in Pi, . . . , P& iff P is 1-decomposablc in P{, . . . , P' m with m > k. 

Proof - The first iff is straightforward. For the second iff, we can construct a 

1- decomposition Pi, ... , P m of P from a decomposition Gi, . . . , Gu of G, with 
m > k. The construction is described for the case k = 1 (i.e. Pi = P), and 
can be easily generalized. Let G = (V, E) be the graph constraint of P. Let 
V s = {x G V \D X \ = 1}. Transform G into G' where G' is the graph G without 
variables with a singleton domain. More formally, G' = (V , E') with V = V\V S 
and E' = (V x V) fl P. Suppose G' is decomposable into G' l7 . . . , GJ n (m > 1). 
Then, nodes associated to variables with a singleton domain and their associated 
edges are added to the G[, giving G\ = (V^Ej). More formally G] = (V^E}) 
where V± = V- U V s and P^ = (T^ 1 x V±) n P. The graphs G}, . . . , G,^ are the 
constraint graphs of the partial CSPs Pj of the 1-dccomposition of P. ■ 
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The above property is especially useful when k — 1. In this case, the 0- 
decomposition does not decompose the CSP, while 1-decomposition may decom- 
pose it. 

2.3 Relationship with AND/OR search tree 

Another approach to define decomposable CSPs is to use the concept of AND / OR 
search spaces defined with pseudo-trees [15] . 

Definition 6. Given an undirected graph G = (V, E), a directed rooted tree 
T = (V, E') defined on all its nodes is called pseudo-tree of G if any arc of 
E which is not included in E' is a back-arc, namely it connects a node to an 
ancestor in T. 

Definition 7. Given a CSP P — (X,T>,C), its constraint graph G p and a 
pseudo-tree T p of G p , the associated AND/OR search tree has alternating levels 
of OR nodes and AND nodes. The OR nodes are labeled Xi and correspond to 
variables. The AND nodes are labeled < Xi,Vk > and correspond to assignment 
of the values Vk in the domains of the variables. The root of the AND/OR search 
tree is an OR node, labeled with the root of the pseudo-tree T p . The children of 
an OR node Xi are AND nodes labeled with assignments < Xi,vu >, consistent 
along the path from the root. The children of an AND node < Xi,Vk > are OR 
nodes labeled with the children of variable Xi in T p . 

Semantically, the OR states represent alternative solutions, whereas the AND 
nodes represent the problem decompositions into independent partial problems, 
all of which need to be solved. When the pseudo-tree is a chain, the AND/OR 
search tree coincides with the regular OR search tree. 

Following the ordering induced by the given a pseudo-tree T p of the con- 
straint graph of a CSP P, the notion of 1-dccomposability coincides with the 
decompositions induced by an AND/OR search. 

Property 3. Given a CSP P — (X,T>,C), a pseudo tree T p over the constraint 
graph of P and a path p of length I (I > 1) from the root node of T p to an 
AND node pi, the CSP P where all variables in the path p are assigned is 1- 
decomposable into Pi , . . . , where k is the number of OR successors in T p of 
the end node pi . 

Proof - Let yi,. . ■ ,yk {k> 1) be the OR successor nodes of the end node pi 
in T p . We note tree{yi) the tree rooted at yi in T p . Let X s = {v G X\v G p}. 
Then build the partial CSPs P t = (Xi, T>i,Ci) (i G [1, k]): 

X, t = X s U {v £ X | v G tree{y l )} 

v, = {D x eV | x e Xi\ 

C t = {c G C scope(c) C Xi}. 

It is clear that UjgnuCs = C since there exists no constraint between two 
different tree(yi) in T , by definition of a pseudo tree. I 
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As will be explained in the next section, neither static nor dynamic AND/OR 
search is suited for our particular problem. In SIP, the constraint graph is com- 
plete, and thus the pseudo tree is a chain, leading to an AND/OR search tree 
equivalent to an OR search tree. However the CSP P becomes 1-dccomposablc 
during search and a dynamic framework is needed in order to check decomposi- 
tion on any state during the search. But this is computationally very expensive 
as we will show in Section 0] 

3 Applying decomposition to SIP 

3.1 Subgraph Isomorphism Problem Definition 

A subgraph isomorphism problem between a pattern graph G p = (V p ,E p ) and 
a target graph Gt = (Vt,E t ) consists in deciding whether G p is isomorphic 
to some subgraph of Gt. More precisely, one should find an injective function 
/ : V p -> V t such that \f(u,v) G E p : (/(it), /(«)) G E t . This NP-Hard problem 
is also called subgraph monomorphism problem or subgraph matching in the 
literature. The function / is called a subgraph matching function. We assume the 
graphs are directed. Undirected graphs are a particular case where undirected 
arcs are replaced by two directed arcs. 

The CSP model P = (X, V, C) of subgraph isomorphism should repre- 
sent a total function / : V p — > V. This total function can be modeled with 
X = xi, . . . , x n with Xi a FD variable corresponding to the i th node of G p and 
D(xi) = Vt- The injective condition is modeled with an alldif f [x\, . . . , x n ) 
global constraint. The isomorphism condition is translated into a set of n k- 
ary constraints MCi = (xi,Xj) G E t for all Xi £ V p . Given the above mod- 
elling, the constraint graph of the CSP, called the SIP constraint graph, is the 
graph G p = (V P ,E P ) where V p = X and E p = E p U E^. Note, E p is rep- 
resenting all propagations of the MCi constraints while E-t depicts the global 
alldif f -constraints, i.e. a clique (E^ = V p x V p ). Therefore, the SIP-CSP con- 
sists of global constraints only that would prevent decomposition using a static 
AND/OR search. Implementation, comparison with dedicated algorithms, and 
extension to subgraph isomorphism and to graph and function computation do- 
mains can be found in [15] . 

3.2 Decomposing SIP 

This subsection explains how to decompose the SIP problem. We first show why 
static AND / OR search fails by studying the SIP constraint graph. 

Static AND /OR Search: Because of the alldif f-constraint, the SIP con- 
straint graph corresponds to the complete graph K\y |. The pseudo-tree com- 
puted on the constraint graph of any SIP instance is a chain, detecting no decom- 
position at all. Moreover, the initial SIP constraint graph is not 1-decomposable. 
Therefore a static analysis of the SIP-CSP yields no decomposition at all and is 
not applicable. 
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Decomposition seems difficult to achieve. However, as variables are assigned 
during search, 1-dccomposition may occur at some nodes of the search tree. A 
dynamic detection of 1-decomposition at different nodes of the search tree gives 
a first way of detecting decomposition for the SIP. 

Dynamic AND/OR Search: A dynamic analysis of the SIP constraint graph, 
as done for dynamic AND/OR search, takes care of possible constraint entail- 
ments and propagation results. It is therefore very usefull for a strongly propa- 
gated CSP. The main drawback is the slow down due to the additional propa- 
gation and dynamic decomposition checks. Further, the SIP constraint graph is 
still a complete one and does not allow for decomposition. 

Our 1-decomposition removes assigned variables in the decomposition pro- 
cess. One could also remove entailed constraints, leading thus to more decompo- 
sition. This can easily be done for the alldiff -constraint by removing an edge 
(xi, Xj) G E p representing xi ^ Xj when D,; P\ Dj = (i ^ j). In the following, 
we redefine the constraint graph of a SIP as a constraint graph for the morphism 
constraints together with a dynamic constraint graph of the alldif f-constraint. 

Definition 8. Given the CSP P = (X,V,C) of a SIP instance, its SIP con- 
straint graph is the undirected graph G = (V,E MC U E^), where V = X, 
E MC = {(x^Xj) G E p | Xi ,Xj G X} and E^ = G X X X | A n Dj = 0}. 

Given the particular structure of a SIP constraint graph, it is possible to 
specialize and simplify the detection of 1-decomposition. 

Property 4- Let P = (X,V,C) be a CSP model of a SIP instance, and let 
G = (V,E MC U E±) be its SIP constraint graph. Let M = (V',E') be the 
constraint graph without assigned variables, i.e. with V' = {x G X \ \D X \ > 1} 
and E' = (V x V) n E MC . Then P is 1-decomposable into Pi, ... , P m iff M is 
decomposable into Mi, ... , M m and D(Mj) n D(Mj) = (1 < i < j < m) with 
D(Mi) the union of the domains of the variables associated to the nodes of Mj. 

The above property states that the decomposition of M is a necessary con- 
dition. Wc can therefore design heuristics leading to the decomposition of M, 
hence in some cases in the decomposition of P. 

A direct approach consists in detecting 1-decomposition at each node of the 
search tree. When the CSP becomes 1-decomposable in partial CSPs, those are 
computed separately in AND nodes. As show in the experimental section, this 
strategy proves to be much slower than a standard OR search tree. The reason 
is twofold: 

1. Decomposition is tested at every node of the search tree. Starting from the 
root node is useless, as a lot of computation time is lost. 

2. There is no guarantee that a decomposition will occur. 

Based on this observation, we present a hybrid approach combining the best 
of the static and dynamic strategy. 
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The Hybrid Approach: As stated before, even a dedicated dynamic AND/OR 
search, checking for decomposition on the reduced constraint graph only, is not 
fast enough to compete with state-of-the-art SIP-solvers as implemented in the 
vf lib library. Therefore, we suggest a hybrid approach in order to fix this. The 
idea is as follows: 

1. calculate a static pseudotree heuristic on the reduced constraint graph 

2. apply a forward checking search following the pseudotree up to the first 
branching or until a fixed number of variables is assigned 

3. switch the strategy to dynamic AND/OR search with full AC-propagation 

This ensures, that the expensive dynamic approach is first used when a de- 
composition is available or at least likely after full propagation. Up to that mo- 
ment, a cheap forward checking approach is used for a fast inconsistency check 
and a strong reduction of the reduced constraint graph. 

In the following, we will give two dedicated heuristics we have applied in the 
preliminary forward checking procedure. 

3.3 Heuristics 

We now present two heuristics based on Property [3] aiming at reducing the num- 
ber of decomposition tests, and favoring decomposition. The general idea is to 
first detect a subset of variables disconnecting the morphism constraint graph 
into disjoint components as it is a necessary condition for 1-decomposability. 
The search process will first distribute over these variables. The test of 1- 
decomposition is performed when all these variables are instantiated. It is also 
performed at the subsequent nodes of the search tree. 

The cycle heuristic (hi) The objective of the cycle heuristic is to find a set of 
nodes S in the morphism graph CG MC = (X,E MC ) (see Def. [SJ such that the 
graph without those nodes is simply connected. When the variables associated to 
S are assigned, any subsequent assignment will decompose the morphism graph. 
Finding the minimal set of nodes is known as the minimal cycle cutset problem 
and is a NP-Hard problem [6]. We propose here a simple linear approximation 
that returns the nodes of the cycles of the graph. Algorithm [1] runs in 0(|VJ,|). 
The effectiveness of such a procedure on different classes of problems is shown 
in the experimental section. One of the main advantage is its simplicity. 

Using graph partitioning (h2) Graph partitioning is a well-known technique 
that allows hard graph problems to be handled by a divide and conquer approach. 
In our context, it can be used to separate the morphism constraint graph into 
two graphs of equal size. 

Definition 9. Given a graph G = (V,E), a k-graph partitioning of G is a par- 
tition of V in k subsets, Vi,...,Vk, such that Vi (~l Vj — for i ^ j, U{Vi — V, 
and the number of edges of E whose incident vertices belong to different subsets 
is minimized (called the edgecutj. 
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input : G = (X, E) the CG MC 
output: The nodes of the cycles of G 

All «- X 

while (3 n £ X | Degree(n) == 1 ) do 

r^ru{n} 

remove node n from G 
end 

return A?/ \ T 

Algorithm 1: Selection of the body variables. 

Based on the edgecut of the morphism constraint graph, we can easily deduce 
a subset variables. 

Definition 10. Given a 2-graph partitioning of G, a nodecut is a set of nodes 
containing one node of each edge in the cutset. 

Finding a minimum edgecut is a NP-Hard problem for k > 3, but can be 
solved in polynomial time for k = 2 by matching (see [8], page 209). However we 
use a fast local search approximation |llj . as the exact minimum subset is not 
needed. 

4 Experiments 

Goals - The objective of our experiments is to compare our decomposition 
method on different classes of SIP with standard CSP models as well as vf lib. 
the standard and reference algorithm for subgraph isomorphism [2]. We also 
compare our decomposition method with standard direct decomposition. The 
different heuristics presented in Section 3.5 are also tested. 

Instances - The instances are taken from the vf lib graph database described 
in [7]. There are several classes of randomly generated graph, random graphs, 
bounded graphs and meshes graphs. The target graphs has a size n and the 
relative size of the pattern is noted a. For random graphs, the target graph has 
a fixed number of nodes n and there is a directed arc between two nodes with 
a probability rj. The pattern graph is also generated with the same probabil- 
ity rj, but its number of nodes is an. If the generated graph is not connected, 
further edges are added until the graph is connected. For random graphs, n 
takes a value in [20,40, 80, 100, 200, 400, 800, 1000], n in [0.01,0.05,0.1], and a in 
[20%, 40%, 60%]. There are thus 69 classes of randomly connected graphs. In a 
class of instances denoted as si2-r001-m200, we have a = 20%, r\ = 0.01, and 
n = 200 nodes. 

Mesh-A:-connectcd graphs are graphs where each node is connected with its 
k neighborhood nodes. Irregular mesh-/c-connected graphs are made of a regular 



10 



mesh with the addition of random edges uniformly distributed. The number of 
added branches is pn. For random graphs, n can take a value in [16, . . . , 1096], k 
in [2, 3, 4], and p in [0.2, 0.4, 0.6]. In an irregular mesh-connected class of instances 
denoted as si2-m4Dr6-m625, we have a — 20%, k — 4, p — 0.6 and n = 625 
nodes. 

One hundred graphs are generated for each class of instances. For random 
graphs, we also generated 100 additional instances where the target graph has 
1600 nodes, for each possible value of r\ and a. We used the generator freely 
available from the graph database, following the methodology described in [7j. 

Models - Several models were considered for the experiments. First of all, we use 
the available implementation of vf lib. Then classical CP models are used, called 
CPFC and CP AC. The model CPFC is a model where all the constraints use forward 
checking and the variable selection selects the first variable which is involved in 
the maximum number of constraints (called maxestr) using minimal domain size 
as tiebreaker. The model CPAC is similar except it uses an arc consistent version 
of the MC constraint. 

The model CP+Dec waits for 30% of the variables to be instantiated following 
a variable selection policy, called minsize), selecting the (uninstanciated) vari- 
able with the smallest domain. It then tests at each node of the search tree if 
decomposition occurs using a maxestr variable selection. The model CP+Dec+hl 
uses the cycle heuristics; once the nodes belonging to the cycles of the pattern 
graph are instantiated using a minsize variable selection policy (up to 30% of 
the size of the pattern), decomposition is tested at each node of the search tree 
and follows a maxestr variable selection. The model CP+Dec+h2 uses the graph 
partitioning heuristics; once the variables belonging to the nodecut set are in- 
stantiated (up to 30% of the size of the pattern) , decomposition is tested at each 
node of the search tree and follows a maxestr variable selection. 

Setup - All experiments were performed on a cluster of 16 machines (AMD 
Opteron(tm) 875 2.2Ghz with 2Gb of RAM) using the implementation of [Ti] , 
All runs arc limited to a time bound of 10 minutes. In each experiment, we 
search for all solutions. Experiments searching for one solution have also been 
done but are not reported here for lack of space. These experiments lead to the 
same conclusions. 

Description of the tables - Table [1] shows the results for random graphs and 
Table [2] for irregular mesh-connected graphs. Each line describes the execution 
of 100 instances from a particular class. The column N indicates the mean 
number of solutions among the solved instances. The column % indicates the 
number of instances that were solved within the time bound of 10 minutes. The 
column p indicates the mean time over the solved instances and the column 
a indicates the corresponding standard deviation. The column D indicates the 
number of instances that used decomposition among the solved instances. The 
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Table 1. Randomly connected graphs, searching for all solutions. 







vflib 


CPAC 


CPFC 




N 


% [i a 


% H a 


% a 


si2-r001-m200 


61E+6 


72 74 115 


83 56 109 


85 41 76 


si2-r001-m400 


17E+8 


2 248 118 


10 106 156 


7 288 177 


si2-r001-m800 


28E+7 


- - 


11 220 136 


1 153 ■ 


si2-r001-ml600 


2500 


16 203 202 


30 227 146 


- - 


si6-r01-m200 


1 


100 2 3 


100 9 11 


100 12 17 


si6-r01-m400 


1 


66 99 133 


89 156 116 


50 190 137 


si6-r01-m800 


1 


7 235 153 


- 


5 389 125 


si6-r01-ml600 


1 


- - 


- 


39 499 51 



Bench 




CP+Dec 


CP+Dec+hl 


CP+Dec+h2 




N 


% H a D #D 


% n g D #D S 


% ii g D #D S 


si2-r001-m200 


61E+6 


94 49 100 91 9244 


98 6 40 98 1834 0.2 


87 23 48 71 909 0.2 


si2-r001-m400 


17E+8 


15 160 177 15 35655 


75 68 125 75 2268 0.4 


29 212 218 22 196 0.3 


si2-r001-m800 


28E+7 


12 


4 227 254 4 21 0.6 


12 256 239 8 0.6 


si2-r001-ml600 


2500 


- - 


7 165 199 1 0.8 


- - 0.9 


si6-r01-m200 


1 


94 148 153 


100 1 


100 1 


si6-r01-m400 


1 


2 179 220 


100 2 1 1 


100 4 6 1 


si6-r01-m800 


1 


- - 


100 46 35 1 


100 46 39 1 


si6-r01-ml600 


1 


- - 


74 479 71 1 


54 435 79 1 



column #D indicates the mean number of decomposition that occurred over all 
solved instances. The column S indicates the mean size of the initial variable 
set computed by the heuristics hi or h2. Table [3] gives the mean degree and its 
variance for the different instances classes. For each class of instances in Tables 
[1] and [21 the results of the best algorithms are in bold. 



Analysis - We start the analysis by looking at random graphs (see Table[l}. We 
compare first the vflib with the CP models CPFC and CPAC. For si2-r001-* 
instances, the CPAC model is the best in mean time and % of the solved instances. 
When the level of consistency is higher for the MC constraint, the search space size 
diminishes, and all solutions are quickly found. For si6-r01-* instances, CPAC 
is the best model for m200 and m400 instances, while CPFC is the best model 
for m800 and ml600 instances. As shown in Tabled the mean degree increases 
with the size of the generated graph. The effect of propagation is modified. The 
MC forward checking propagator is more efficient with denser graphs than an arc 
consistent one. With sparse graphs, an arc consistent MC is cheap and propagates 
a lot, while with denser graphs it is more efficient to wait for instantiation to 
propagate. 

We now look at the use of decomposition for random graphs (second table 
in Table [T|). The first model CP+Dec, which corresponds to a decomposition 
approach that uses the whole constraint graph only, fails. This model cannot 
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Table 2. Irregular meshes, searching for all solutions. 



Bench 




vflib 


CPAC 


CPFC 




N 


% H o 


% n a 


% /x a 


si2-m4Dr6-m625 


88E+5 


89 23 50 


94 21 38 


95 6 27 


si2-m4Dr6-ml296 


17E+7 


16 135 137 


33 178 123 


38 107 154 


si6-m4Dr6-m625 


3.31 


100 7 43 


100 29 4 


100 9 4 


si6-m4Dr6-ml296 


10.38 


100 13 55 


100 233 30 


100 113 65 



Bench 




CP+Dec 


CP+Dec+hl 


CP+Dec+h2 




N 


% IX a D #D 


% ll a D #D S 


% fi a D #D S 


si2-m4Dr6-m625 


88E+5 


35 223 151 35 0.7 


100 6 22 96 5.4 0.5 


94 6 21 88 5.5 0.3 


si2-m4Dr6-ml296 


17E+7 


3 120 36 3 0.1 


63 67 109 63 4 0.5 


49 163 170 49 3.9 0.5 


si6-m4Dr6-m625 


3.3 


8 105 32 


100 7 3 6 0.1 0.8 


100 22 26 6 0.1 0.7 


si6-m4Dr6-ml296 


10.3 


- - 


100 65 20 41 0.6 0.7 


77 223 161 29 0.4 0.7 



Table 3. Mean degree for the tested graph set. 



Bench 


degree 




(i a 


si2-r001-m200 


2.30 0.14 


si2-r001-m400 


2.89 0.14 


si2-r001-m800 


3.99 0.18 


si2-r001-ml600 


6.80 0.19 


si6-r01-m200 


3.29 0.14 


si6-r01-m400 


5.27 0.16 


si6-r01-m800 


9.76 0.15 


si6-r01-ml600 


19.20 0.17 


si2-m4Dr6-m625 


3.51 0.26 


si2-m4Dr6-ml296 


3.53 0.20 


si6-m4Dr6-m625 


5.12 0.16 


si6-m4Dr6-ml296 


5.19 0.14 



take into account the structure of the problem. This can be measured through 
the quality of the decomposition. 

First, we will focus on the si2-r001-* classes. The models CP+Dec+hl and 
CP+Dec+h2 achieve better decompositions than the CP+Dec model. Even though 
CP+Dec tends to induce more decompositions, the number of instances using de- 
composition (see column D) is higher for CP+Dec+hl and CP+Dec+h2 than for 
CP+Dec. This visualizes the computational overhead of a pure dynamic decom- 
position approach. However, the number of instances using decomposition tends 
to be zero for ml600 instances. This is due to the fact that the graphs have higher 
degrees as their size increases (see Table [3]). This can be observed by looking at 
the column S: the size of the initial subset of variable to instantiate becomes 
closer to 100% as size increases. For this reason our decomposition method is 
beaten by the CPAC model for si2-r001-ml600. 
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We now focus on the si6-r01-* classes. As stressed earlier, those instances 
have denser graphs. The initial set of variables to instantiate is the whole set 
of pattern nodes for CP+Dec+hl and CP+Dec+h2. No decomposition occurs. Why 
then CP+Dec+h* models outperform all other methods in those classes? Because 
CP+Dec+h* models use a minsize variable selection policy instead of maxestr for 
CPFC. In the class si6-r01-*, the CP+Dec+hl approach reduces thus to a CPFC 
with a minsize variable selection policy. 

For random graphs, the decomposition method with heuristics is especially 
useful for sparse graphs with many solutions, while a CPFC model using a minsize 
variable selection policy seems the best choice for denser graphs and there are 
few solutions. The vf lib is clearly outperformed on all these classes of instances. 
Experiments on the other classes of random graphs, not reported here for lack 
of space, confirmed this analysis. 

We now analyze irregular mesh-conncctcd graphs. We observe in Table[3]that 
the mean degree of the si2-m4Dr6-* classes is higher than for the si6-m4Dr6-* 
classes. We first compare the vf lib and CP models without decomposition. 
For sparser si2-m4Dr6-* classes, CPFC is the best method, while for denser 
si6-m4Dr6-* classes, vf lib is the best. We have no particular explanation for 
this behavior and this is an open question. Regarding decomposition methods, 
the same remarks than for random graphs apply. The CP+Dec model tends to 
produce less decomposition than the CP+Dec+h* models. Moreover, CP+Dec+h* 
models are the best models for sparser instances with many solutions. As the 
mean degree of the instances increase (see Table [3]) , the decomposition methods 
become less efficient. Indeed, for si6-m4Dr6-ml296, the best method is vf lib, 
but our decomposition approach also solves all the instances and helps CP at 
diminishing the mean time. 

Summary - The application of standard direct decomposition methods CP+Dec 
lead to performances worse than the direct application of standard CP models 
(CPFC, CPAC) and vf lib. On most classes, the cycle heuristic (hi) is better than 
the graph partitioning heuristic (h2). On sparse randomly connected graphs 
with many solutions, and on sparse irregular meshes, our decomposition method 
outperforms standard CP approaches as well as vf lib. For denser connected 
graphs, CP models (CPAC or CPFC with a minsize policy) outperforms vf lib. For 
denser irregular meshes, vf lib, the standard CP models and our decomposition 
method solve all the instances, but vf lib is more efficient. 

5 Conclusion 

Our initial question was to investigate the application of decomposition tech- 
niques as AND/OR search for problems with global constraints, in particular 
for the SIP. We showed that it is indeed possible using a hybrid approach of 
static and dynamic techniques and a dedicated problem structure analysis. For 
the SIP, one can derive a decomposition enforcing static heuristic that is used 
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by a cheap forward checking approach. As soon as the problem gets (likely) de- 
composable, the search process is switched to a fully propagated, dynamically 
decomposed search. This exploits the non-predictable reduction of the constraint 
graph structure via constraint propagation and entailment but reduces the huge 
computational effort of a completely propagated search. We showed that our hy- 
brid decomposition approach is able to beat the state-of-the-art VF-algorithm 
for sparse graphs with high solution numbers. As future work, we would like 
to investigate more heuristics for SIP as it influences the quality of decomposi- 
tion. Moreover, we intend to investigate the use of our decomposition method 
for motif discovery where solving SIP is used as an enumeration tool [§] . 
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